Skip to content

Conversation

Jack-Khuu
Copy link
Contributor

@Jack-Khuu Jack-Khuu commented Sep 8, 2025

Mini update to (titan) ReferenceModel signature to match RefModel of apps/GRPO/main by using Episode as input

Git refuses to pick up the file rename (reference_actor.py => reference_model.py), so I'll callout explicitly what was changed in reference_model.py

  • Deleted code under the experimental section
  • Rename TitanRefModel => ReferenceModel
  • Updated forward to use Episode (which wraps req/resp)
async def forward(self, episode: 'Episode') -> torch.Tensor:
async def forward(self, request: list[int], response: list[int]) -> torch.Tensor:

python apps/grpo/main.py

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 8, 2025
Copy link
Contributor

@pbontrager pbontrager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I'd just request that you import the logprobs computation from the one in trainer when it lands.

Also something to think about would be a way to reuse the trainer user defined batching logic here instead of using Episode.

@Jack-Khuu
Copy link
Contributor Author

Jack-Khuu commented Sep 15, 2025

import the logprobs computation from the one in trainer when it lands.

trainer.py doesn't have a logprobs calc, though the one in rl/main.py Is functionally equivalent to the one here. I'll have a separate PR combining the 3 (rl/grpo/ref)

@Jack-Khuu Jack-Khuu merged commit 0b585c0 into main Sep 15, 2025
5 checks passed
@Jack-Khuu Jack-Khuu deleted the update-refmodel-episode branch September 15, 2025 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants